STA 9750 Final-Project: A Tale of 59 NYC Community Districts

Author

Eduardo Alarcon

Introduction

The Overarching and Specific Questions

The COVID-19 pandemic established new urban norms, transforming job proximity from a strict requirement into a flexible option through remote work. This paradigm shift necessitates an analysis of how traditional value determinants adapted to these new dynamics. Our research team’s work addresses the overarching question (OQ):

Did COVID-19 reshape the relationship between neighborhood characteristics and property values across NYC’s CDs?

While the team’s broader effort examines crime, density, job accessbility, and transit, this analysis focuses on Educational Attainment (EA). Historically, high-education neighborhoods commanded significant price premiums from well-known agglomeration effects. However, the pandemic disrupted these patterns, leading to this research’s focus, which seeks to answer the specific question (SQ):

Did the strength of the relationship between neighborhood educational attainment and property values change post-COVID, and did this change differ across NYC’s Community Districts?

Hypotheses

Given the pandemic’s impact on work-life demands and redefining dwelling needs, this question explores two hypotheses.

Hypothesis 1: The positive correlation between education and property values would strengthen post-COVID.

Hypothesis 2: High-education CDs would experience stronger property value growth post-COVID as remote work freed professionals to prioritize neighborhood amenities over commute times.

Beyond testing these hypotheses, this analysis also investigates whether pandemic-era shifts represent temporary disruption or a systemic realignment of urban real estate economics.

Data Acquisition and Processing

This analysis integrates four data sources through programmatic acquisition, requiring careful transformation to align incompatible geographic coding systems across federal and city databases.

Setup and Configuration

Loading required packages and defining global constants allows for programmatically assembling a 59 CD-level panel for 2017–2019 vs. 2021–2023 and merges it with baseline 2019 American Community Survey (ACS) education variables.

NYC Community District Shapefile

The get_nyc_cd() function retrieves the official shapefile from the NYC Department of City Planning, unzips it, and constructs standardized identifiers for all 59 Community Districts.

PLUTO BBL-to-CD Crosswalk

The get_pluto_cd_crosswalk() function uses raw data from NY Open Data to implement a Borough–Block–Lot (BBL) to CD crosswalk. This process normalizes approximately 870,000 BBLs, resolving formatting inconsistencies and creating standardized linkages between individual tax lots and CDs.

This process removed 1,154 invalid entries (0.1%), preventing silent join failures downstream, as shown in Table 1 below.

Table 1: PLUTO BBL Normalization Results
Stage Count Percentage
Raw PLUTO records 858,284 100%
After BBL normalization 857,130 99.9%
Records removed 1,154 0.1%

Department of Finance Rolling Sales Data

The get_dof_sales_year_boro() function automates collection and cleaning Annualized Sales reports from the NYC Department of Finance (DOF). Also, quality filters remove transactions under $10,000, restricting data collection to residential tax classes (e.g., 1, 2, 2A, 2B, and 2C).

Enhanced BBL Matching: Two-Stage Approach


A two-stage join strategy combines exact BBL matching with a block-level fallback approach. As standard exact-match joins lose approximately 25% of transactions due to condo billing BBLs, the fallback allows the function to join the remaining unmatched sales record to its specific city block. This process results in 100% match rate, minimizing inaccuracies in analyses within high-density areas, as shown in Table 2.

Table 2: Enhanced BBL Matching Results: Two-Stage Approach
Stage Count Percentage
Total sales (all files) 339,826 100%
Stage 1: Exact BBL matches 246,392 72.5%
Stage 2: Block-level matches 93,434 27.5%
Total matched to CDs 339,826 100%
Overall match rate 100%

Note: The code block below handles all data acquisition and pre-processing to establish the foundational datasets for this research These background processes include structural validations to ensure data integrity before thorough analysis.

American Community Survey Education Data

Sourced from Table B15003, the American Community Survey (ACS) dataset aggregates educational variables from approximately 2,200 tracts that do not perfectly align well with NYC CD boundaries. Therefore, it is best to use an area-weighted aggregation approach, which is a geographic method that redistributes data between mismatched areas to account for 100% of the population.

\[\text{Total}_{\text{Pop BA+}} = \sum_{\text{tracts}} \left( \text{Tract}_{\text{Pop BA+}} \times \frac{\text{Intersection Area}}{\text{Tract Area}} \right)\]

Treating EA as a fixed baseline is a critical research control. This step is necessary to isolate pure market demand from potential confounding variables. Updating education data post-COVID would obfuscate the source of price changes (e.g., preference changes or population shifts). Consequently, holding education constant at 2019 levels ensures that results yield an accurate measure of changing housing demand.

Table 3 below demonstrates this fixed baseline strategy, showing that the same 2019 ACS education data is consistently applied across all six research years (2017-2023), enabling clean isolation of pandemic-era market shifts.

Table 3: Sales Years Matched to Fixed ACS Baseline (2019)
Year Period ACS Baseline CDs
2017 Pre-COVID 2019 59
2018 Pre-COVID 2019 59
2019 Pre-COVID 2019 59
2021 Post-COVID 2019 59
2022 Post-COVID 2019 59
2023 Post-COVID 2019 59

Temporal Scope and Final Integration

The core of this analysis compares two three-year periods: pre-COVID (2017-2019) and post-COVID (2021-2023).

Table 4: Temporal Scope of Analysis
Component Detail
Pre-COVID sales period 2017, 2018, 2019 (3 years)
Post-COVID sales period 2021, 2022, 2023 (3 years)
Excluded year 2020 (pandemic disruption)
Education data (baseline) ACS 2015–2019 (5-year), used as baseline for both periods

Omitting 2020 is essential to ensure analysis integrity. Given the acute shocks from this year, any statistical anomalies may distort long-term trend analysis; thus, bypassing it yields a clearer view of the market’s post-COVID response.

The final data integration shown in Table 5 confirms critical data merging (e.g., median prices by CD-year) with education baselines held constant at 2019 levels.

Table 5: Merged Panel Diagnostics
Min Years (CD Period) Max Years (CD Period) Min BA+ Values Max BA+ Values Min ACS Years Max ACS Years
3 3 1 1 1 1

The approaches highlighted in the Data Acquisition and Processing section creates a balanced panel of 354 CD-year observations, with 59 CDs encompassing six years worth of data. This structure facilitates this research’s Difference-in-Differences (DiD) analysis, ensuring that any identified trends accurately link to post-pandemic shifts.


Pre-COVID Analytical Framework

Creating the Analysis Set

This baseline analysis addresses the OQ by establishing EA as a strong neighborhood predictor pre-pandemic. Quantifying this “education premium” establishes the context for the study to determine whether the pandemic weakened or strengthened the link between EA and property values.

To analyze how these different CDs responded to the pandemic, this research applied a non-parametric, tercile grouping approach, stratifying EAs into “Low,” “Medium,” and “High” tiers to mitigate outlier effects when comparing distributions.

Table 6 shows clear delineation, with Low-education CDs yielding a 19% BA+ Attainment average, while High-education CDs average 53% BA+ Attainment. This variation establishes a tangible baseline for comparing how CD housing markets evolved during the pandemic era.

Table 6. Education Tercile Ranges
Education Group Number of CDs Min BA+ (%) Max BA+ (%) *Mean BA+ (%) Median BA+ (%)
Low 20 11.7 27.3 19.3 19.4
Medium 19 28.5 40.2 34.1 34.5
High 20 40.5 82.5 59.6 52.8

This tercile structure enables parallel-trends testing.

Note: The 40.1 percentage point (pp) gap between high and low tercile means (59.5% - 19.4%) will be part of later internal consistency checks in the regression analysis.

Pre-Trend Diagnostics

A DiD design is a favorable approach to filter out factors (e.g., economic shifts and neighborhood characteristics) impacting trends, allowing for a strict focus on the pandemic’s effect. This research evaluates a Parallel-Trends Assumption (PTA) framework, to support a DiD interpretation of genuine structural shift in market behavior rather than pre-existing trends.

Logarithmic Transformation

To meaningfully execute this comparison, this analysis implements a logarithmic transformation of property values. Because High-EA CDs begin at significantly higher baselines, analyzing raw dollars would blur the comparison of trends. This standardization process yields relative appreciation rates, ensuring comparability across terciles. For example, a 0.10 log point shift results in an approximate 10% change in value.

Figure 1 shows how High-EA CDs start at roughly twice the price level of low-education districts. However, the three trajectories rise at similar rates, reinforcing the PTA.

Table 7 below highlights stable growth consistency, with annual growth ranging from 3.0% to 8.2%.

Table 7: Pre-COVID Annual Price Growth by Education Group
Education Group Log-Point Slope Annual % Growth
Low 0.079 8.2
Medium 0.030 3.0
High 0.043 4.4

Small inter-group differences of 0.049satisfies the PTA requirements, confirming that post-pandemic changes represent structural shifts rather than trajectory continuations.

Education and Value: A Lesson on Geography

NYC’s exceptionally diverse EA landscape requires establishing baseline disparities before testing pandemic impacts, as shown in Table 3 below.

Table 3. Distribution of Educational Attainment Across NYC Community Districts (2019)
CDs Mean SD Min 25th %ile Median 75th %ile Max
59 37.7% 19.9% 11.7% 24.4% 34.5% 43.9% 82.5%

EA varies widely across CDs (SD ≈ 19.8 pp). The 18.9 pp gap between the 25th and 75th percentiles supports tercile grouping, which better accounts for extreme shifts in CD behavior.

Table 4 below further highlights theses disparities, with BA+ Attainment ranging between 11.8% (e.g., BX01, the South Bronx) to a maximum of 82.7% (e.g., MN05, the Upper West Side), representing a 70 pp difference.

Table 4. Community Districts with Highest and Lowest Educational Attainment (2019)
Rank CD ID Borough BA+ Attainment
Top 3 MN05 Manhattan 82.5%
Top 3 MN01 Manhattan 81.5%
Top 3 MN08 Manhattan 81.1%
Bottom 3 BX01 Bronx 11.7%
Bottom 3 BX05 Bronx 12.2%
Bottom 3 BX06 Bronx 12.9%

The interactive Leaflet below highlights EA disparities across CDs. Clicking on an individual CD reveals its percentage of EA.

The Education Premium

Pre-COVID, there was a strong linear correlation (r = 0.77) between neighborhood EA and its property value.

The simple linear regression below models this premium.

\[\text{Median Price}_{\text{pre}} = \beta_0 + \beta_1 \times \text{BA+\%}_{2019} + \epsilon\] Table 5a presents the full regression results, showing that each additional pp of BA+ Attainment predicts a $13,816 higher median sale prices, which is statistically significant (α = 0.05, p < 0.001).

Table 5a: Pre-COVID Baseline Regression: Median Sale Price on Educational Attainment
Term Coefficient Std. Error 95% CI Lower 95% CI Upper t-statistic p-value
Intercept 234,656 63,274 107,952 361,361 3.71 < 0.001
BA+ Attainment (%) 13,842 1,486 10,866 16,818 9.31 < 0.001

Consequently, the full regression equation appears as:

\[\widehat{\text{Median Price}}_{\text{Pre-COVID}} = \$235{,}462 + \$13{,}816 \times \text{BA+\%}\]

Moreover, EA explains 59.5% of the variation in property values across CDs, as shown in Table 5b below.

Table 5b: Model Fit Statistics
Adjusted R² Residual SE F-statistic p-value
0.603 0.596 225,270 86.74 < 0.001

Although EA appears as a dominant neighborhood predictor pre-COVID, the analysis below demonstrates a statistically significant disruption to this trend, exposing a sharp post-pandemic reversal in the education premium.


Post-COVID Analysis and Results

The Education Reversal

The post-COVID scatter plot below reveals a weaker but still positive relationship between EA and property values (r = 0.74, down from r = 0.77 pre-COVID).

While high EA neighborhoods maintain an absolute price advantage, the correlation’s decline suggests the education premium diminished during the pandemic years, leading to the rejection of Hypothesis 1.

The regression line’s flatter slope indicates that each additional pp of BA+ Attainment predicts a smaller price differential than in the pre-COVID period. Table 5 provides additional clarity with actual price changes across terciles, revealing which groups appreciated fastest.

Table 5. Pre- and Post-COVID Median Prices and Changes by Education Group
Education Group CDs Pre-COVID Median Post-COVID Median Change ($) Change (%)
Low 20 $517,091 $641,946 $124,854 26.03%
Medium 19 $720,373 $813,311 $92,938 15.03%
High 20 $1,031,182 $1,127,420 $96,237 11.86%
All CDs 59 $756,823 $861,699 $104,875 17.68%

These findings indicate a remarkable reversal of the traditional education premium. From 2017-19 to 2021-23, Low-education CDs experienced 26.03% median price growth, whereas Medium-education CDs and High-education CDs grew only by 15.03% and 11.86%, respectively. The most notable finding is that High-education CDs grew at less than half the rate of its Low-education counterparts, representing a 14.2% point difference.

This result rejects Hypothesis 2, as Low-education CDs saw the fastest appreciation. The Leaflet map below highlights this appreciation pattern across all CDs.

To contextualize this reversal: a median-priced home in a low-education CD (e.g., BX07) gained approximately $252,500 in value, compared to $5,000 in a high-education CD (e.g. MN07), representing a $247,500 difference attributable to the EA composition of the neighborhood.

The t-test in Table 6 below confirms this pattern.

Table 6. Difference in Average Price Growth Between High- and Low-Education CDs
Comparison Difference 95% CI t-statistic df p-value
High − Low -14.2 pp [-22.9, -5.4] -3.27 37 0.002

Differences between high and low terciles is statistically significant at p = 0.002, with a 95% confidence interval entirely excluding zero. As a result, this outcome indicates that the pattern is unlikely to have occurred by chance.

Figure 4 below shows the profound magnitude of this reversal.

Figure 4. Property Value Growth by Education Tercile. Mean percentage change in median sale price with 95% confidence intervals. Error bars represent uncertainty in the average CD-level price change within each tercile.

Non-overlapping confidence intervals confirm distinct economic outcomes between the High and Low EA groups, indicating the 14.2 pp divergence represents structural shifts rather than anomaly.

Parametric Regression Analysis

While terciles demonstrate reversal magnitude, a modified continuous regression quantifies how each incremental BA+ percentage point influenced post-COVID appreciation.

As shown in Table 7, each additional pp of BA+ Attainment predicts 0.376 pp less price growth. This statistically significant (p < 0.001) negative coefficient directly contradicts the pre-COVID pattern, where higher EA predicted higher prices. The relationship has not only weakened, it has reversed.

Table 7. Post-COVID Regression: Price Growth on Educational Attainment
Term Coefficient Std. Error 95% CI Lower 95% CI Upper t-statistic p-value
Intercept 32.006 3.340 25.317 38.695 9.58 < 0.001
BA+ Attainment (%) -0.380 0.078 -0.537 -0.223 -4.84 < 0.001

Comparing pre- and post-COVID models reveals this shift’s extent.

In the pre-COVID period, higher education predicted higher absolute prices, with each pp of BA+ Attainment adding nearly $14,000 to median home values.

  • \(\widehat{\text{Median Price}} = \$235,462 + \$13,816 \times \text{BA+\%}\)

In the post-COVID period, this relationship inverted: higher education predicted slower price appreciation.

  • \(\widehat{\text{Price Change}} = 31.86\% - 0.376 \times \text{BA+\%}\)

Moreover, the post-COVID regression yields an \(R^2\) of 0.282, indicating that baseline EA alone explains 28.2% of the variation in price appreciation. Although lower than the pre-COVID model, this \(R^2\) yields acceptable explanatory power for a growth metric, confirming that EA remained a primary, yet inverted, driver of market disparity during the pandemic.

Table 8. Model Fit Statistics
Adjusted R² Residual SE F-statistic p-value
0.291 0.279 11.89 pp 23.42 < 0.001

Internal Consistency Check

The tercile-based and regression-based approaches should yield consistent estimates if the education-growth relationship is approximately linear.

From Table 7, each additional pp increase in BA+ Attainment predicts a -0.376 pp change in price growth. Moreover, Table 1 highlighted the average education gap between High and Low terciles to be 40.1 pp. 

Using the regression coefficient, it is possible to predict the expected difference:

\[\text{Predicted Difference} = \text{Education Gap} \times \text{Regression Coefficient}\]

\[\text{Predicted Difference} = 40.1 \text{ pp} \times (-0.376) = -15.1 \text{ pp}\]

This means the regression model predicts that High-EA CDs should grow 15.1 pp less than Low-EA CDs.

From Table 5, we observed that Low-EA CDs actually grew 14.2 pp more than High-EA CDs (26.03% - 11.86% = 14.17% ≈ 14.2 pp).

Table 9 compares these two estimates to assess internal consistency.

Table 9. Internal Consistency Check: Tercile DiD vs. Continuous Regression
Quantity Value
Average education gap (High − Low) 40.4 pp
Regression coefficient (pp per 1% BA+) -0.38
Predicted DiD from regression -15.3 pp
Observed DiD from tercile table 14.2 pp

The regression-based prediction (-15.1 pp, from High’s perspective) closely matches the observed tercile difference (+14.2 pp, from Low’s perspective). This close correspondence (e.g., less than a 7% rate of change) confirms internal consistency between the non-parametric (tercile) and parametric (regression) approaches.

Whether comparing discrete education groups or modeling continuous relationships, the conclusion remains the same: Low-EA CDs experienced substantially faster price growth during the post-COVID period, with the magnitude of this reversal measuring approximately 14-15 pp.


Robustness: Citywide Validation by Borough

The previous sections highlighted stark differences in EA and property values in two boroughs. However, it is essential to examine whether this inverted education-growth relationship holds across all five boroughs, as each contain different demographic compositions and unique pandemic experiences.

Table 10. Distribution of CD-Level Price Growth by Borough
Borough CDs Mean Change SD Min Max
Manhattan 12 6.4% 15.9% -16.1% 45.7%
Bronx 12 31.7% 11.7% 4.8% 51.5%
Staten Island 3 22.6% 1.5% 21.6% 24.3%
Queens 14 16.9% 11.7% 2% 38.2%
Brooklyn 18 15.7% 8.6% 1.1% 39.4%

Table 10 reveals that pandemic-era property value growth occurred across all five boroughs, though growth rates varied. However, there are significant variations between boroughs. Specifically, Manhattan exhibited the highest volatility (\(SD = 15.9\%\))). Queens showed similar variation (\(SD=11.7\%\)), while Brooklyn remained relatively more consistent (\(SD=8.6\%\)).

Table 11 quantifies how EA impacts property values within each borough.

Table 11. Education–Growth Relationship by Borough
Borough CDs Correlation (r) Slope (pp per 1% BA+)
Staten Island 3 -0.991 -0.682
Bronx 12 -0.346 -0.476
Manhattan 12 -0.389 -0.298
Queens 14 -0.221 -0.230
Brooklyn 18 -0.009 -0.005

There is a consistent negative relationship between EA and price growth across all five boroughs. All boroughs show negative correlations and negative slopes, confirming the pattern extends beyond any single market.

Staten Island (SI), with its three CDs, exhibits the strongest negative relationship (r = -0.990, slope = -0.721).

Brooklyn’s near-zero slope (-0.005) and weak correlation (-0.008), suggesting that other factors (e.g., gentrification) drove appreciation more than EA. Additionally, its close proximity to Manhattan may have likely outweighed EA in determining appreciation. Consequently, Brooklyn’s unique outcome calls for deeper investigation in future research.

Figure 5 visualizes these borough-specific slopes for direct comparison.

This figure confirms that no borough experienced the traditional positive education-growth relationship during the post-COVID period. In four of five boroughs, there was a moderate-to-strong negative relationship, with slopes ranging from -0.223 to -0.721.

Discussion

Interpretation

Several factors likely explain this borough-wide education premium reversal.

Remote work reduced the need for living near employment-rich and amenitiy-filled hubs, usually concentrated in high-EA CDs. Also, affordability pressures led buyers to migrate toward to undervalued CDs in the outer-borough areas, offering greater access to homeownership opportunities. Lastly, lower valuation CDs experienced post-pandemic market correction, leading to catch-up growth, or a donut effect, shifting market demand and creating higher prices in once affordable CDs.

These factors likely worked together in as a systemic feedback loop. As the agglomeration premium of high-EA CDs diminished, market demand shifted toward peripheral, or outer-borough markets. This demand surge created a compounding effect, accelerating appreciation in Low-EA districts while High-EA growth stagnated.

It is essential to clarify that this analysis focuses on highlighting a change in growth rather than hierarchy. High-EA neighborhoods maintained their absolute price dominance throughout the pandemic. The reversal occurred primarily in appreciation rates. However, it remains unclear whether these findings reveal a permanent value change or a temporary disruption.

Contribution to Overarching Question

This analysis demonstrates that COVID-19 reshaped the relationship between neighborhood characteristics and property values by reversing the education premium, which was historically a strong predictor of urban real estate prices. Moreover, this finding connects with the team’s individual analyses of other neighborhood characteristics. This research:

  • Aligns with transit findings: Both accessibility and education premiums weakened.

  • Contextualizes density analysis: The apparent density penalty was inaccurate, likely driven by education.

  • Contrasts with job accessibility: Job accessibility remained stable while high-EA CDs lost their premium and low-EA CDs areas gained value, showing the education premium reversal.

  • Provides baseline context for crime results: Heterogeneous effects by initial crime conditions mirror our tercile patterns.

The education reversal appears to be the dominant structural shift, with other characteristics showing stability (jobs) or modest weakening (density and transit). This suggests pandemic housing dynamics re-calibrated a longstanding urban economic geography.

Limitations and Conclusion

While this analysis provides clear evidence of a post-pandemic reversal, several limitations inform the results. First, CD level data may obscure location-specific variations, such as gentrification within Low EA areas. Second, residents in high EA CDs may have held onto their properties; thus, low growth may have resulted from a lack of available real estate, masking retention premiums. Finally, as the data extends only through 2023, it is unclear if these findings are indicative of temporary disruptions or longstanding, permanent shifts, considering recent employer mandates calling for employees to return back to physical work locations.

Despite these limitations, this research reveals pre- and post-COVID shifts reversed the relationship between neighborhood EA and property values in NYC. The pandemic not only disrupted NYC property value logic, it reshaped buyer preferences. Consequently, affordability pressures created a new urban landscape where the traditional high EA clusters, a once historical driver of property value, are less significant with demand shifts toward value offered by outer-borough CDs.


References

Data Sources


Methods and Technical References


Background and Context Readings


Peer Project Pages